Skip to content

No page navigation lock ? #823

Closed
Closed
@caddave

Description

I'm looking for a page navigation lock feature or alternative like phantomjs has http://phantomjs.org/api/webpage/property/navigation-locked.html

Activity

aslushnikov

aslushnikov commented on Sep 25, 2017

@aslushnikov
Contributor

@davedmarketing no, there's no navigation lock for now. Could you please share your usage scenarios - why would you need one?

caddave

caddave commented on Sep 25, 2017

@caddave
Author

@aslushnikov For me I was looking into it for a site that had unpredictable behavior and was getting redirected randomly or elements with links popping up randomly and my script was clicking on them accidentally.

GuilloOme

GuilloOme commented on Oct 23, 2017

@GuilloOme

here is my use case (close to @davedmarketing one):

Context: Using puppeteer to inject a probe in a page and retrieving urls related to the given page.
Requirements: Staying on a the page as long as we do not have all the urls.
Behavior: The probe "click/hover/keystroke" every part of the page and it's commun to hit a "navigation away" action (through a window.location='url' or window.location.href='url'call)

GuilloOme

GuilloOme commented on Nov 10, 2017

@GuilloOme

I finally found a workaround to this (ie. blocking navigation while retrieving the destination url)… with a custom chrome extension.

But, because of #659, it CAN'T be used with headless: false. Hopefully, this should be resolved with #872 .

Why a chrome extension?
Short version: because of DOM insolation
Long version:
At the page level, we can block navigation away by hijacking the window.onbeforeunload event but we will not be able to know what is the destination.
The only way (I found) to prevent navigation is to use the chrome.webRequest.onBeforeRequest event available at extension level because it provide a way to manipulate the request before it is issue to the network level (so before the navigation start).

How it works?

  1. On a request related to "frame type" navigation, the extension block it.
  2. the extension notify the content-script that a url have been blocked.
  3. the content-script notify the page of the blocked url.
  4. puppeteer can then retrieve the url from the page.

We need here a content-script to pass message between the extension and puppeteer (through the page)

here the snippet of the blocking logic in the extension:

function onBeforeRequestListener(details) {
    let currentTab = tabs[details.tabId]; // retrieving the current tab
     
    // blocking only when the request is for http and with different url
    if (currentTab.url.startsWith('http') && _compareUrls(details.url, currentTab.url)) { 
        console.warn(`Navigation to ${details.url} blocked.`); 

        // message the page about this blocked url
        chrome.tabs.executeScript(details.tabId, {file: 'content.js'}, function() {
            chrome.tabs.sendMessage(details.tabId, {url: details.url});
        });
        
        // redirect to nowhere
        return {redirectUrl: 'javascript:void(0)'};
    }
}

Full code available: https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb

aslushnikov

aslushnikov commented on Nov 15, 2017

@aslushnikov
Contributor

@GuilloOme interesting approach! You can also implement this idea using request interception. This way you won't need to run extension.

GuilloOme

GuilloOme commented on Nov 15, 2017

@GuilloOme

@aslushnikov, correct me if I'm mistaken:
I did check it but I couldn't find a way to differentiate a "navigation" request type from a "document" request type. The chrome API give me main_frame and sub_frame as a request type which is more precise than document provided by the DevTools Protocol. As I found out, the DevTools protocol identify the resource typedocument only based on the mimeType (see my answer on stackoverflow)

sukhjitsingh-impactradius

sukhjitsingh-impactradius commented on Nov 21, 2017

@sukhjitsingh-impactradius

@aslushnikov rather than a navigation lock I wanted to click and open a new tab for each element on the page and setrequestinterception for each new page, but the request interception is not added in time to catch the initial redirect. I've attemped @GuilloOme's solution like above using chrome.webRequest but like he says above it lacks a way to connect the extension to the puppeteer code (or I'm not using it correctly). I've scoured the issues and stackoverflow, is there a way to set requestinterception at the browser level? If not, do you have any suggestions?

const page = await browser.newPage();
await page.goto("someurl");
let elements = page.$$('button'); // anchors, buttons, and any other elements to click
// targetcreated event fires once new page is opened
const newPagePromise = new Promise(fulfill => browser.once('targetcreated', target => fulfill(target.page())));
// click using middle button to open page in new tab
elements[0].click({button:'middle'});
// wait for newPage to be created from above promise
const newPage = await newPagePromise;

// gather requests from newly opened tab/page
if(newPage != null){
    newPage.setRequestInterception(true);
    newPage.on('request', interceptedRequest => {
        console.log(`-------InterceptedRequest.url: ${interceptedRequest.url}`);
        interceptedRequest.continue();
    });
}

^This is where I'm at right now, but again, hitting that newPage.on() too late.

GuilloOme

GuilloOme commented on Nov 21, 2017

@GuilloOme

@sukhjitsingh-impactradius, communication between the extension and the node process is not straightforward but possible, I did that here.
To achieve that, you have to use messaging between the extension and the page (through the content.js, see: l.22 of my gist). Then, you have to inject some code in the page DOM to handle the message in the page using page.evaluate() and use page.exposeFunction()to return the result to puppeteer.
As I said, not straightforward but possible 😉

aslushnikov

aslushnikov commented on Nov 21, 2017

@aslushnikov
Contributor

I did check it but I couldn't find a way to differentiate a "navigation" request type from a "document" request type.

@GuilloOme ah indeed, you're right. My suggestion wouldn't work for you then.

@aslushnikov rather than a navigation lock I wanted to click and open a new tab for each element on the page and setrequestinterception for each new page, but the request interception is not added in time to catch the initial redirect.

@sukhjitsingh-impactradius that's a good usecase, thank you. @caseq: this might be another point towards browser-wide request interception

sukhjitsingh-impactradius

sukhjitsingh-impactradius commented on Jan 15, 2018

@sukhjitsingh-impactradius

@aslushnikov is there work being done for this request or is there possibly somewhere you think upstream where I can log a ticket for this, if it hasn't already?

aslushnikov

aslushnikov commented on Jan 16, 2018

@aslushnikov
Contributor

is there work being done for this request

@sukhjitsingh-impactradius not yet.

is there possibly somewhere you think upstream where I can log a ticket for this, if it hasn't already?

You can file a ticket on crbug.com against devtools protocol, it will help us with bookkeeping.

10 remaining items

Wildhoney

Wildhoney commented on Feb 8, 2019

@Wildhoney

@Qianlitp good thinking! Something like the following seems to work just fine:

page.on('dialog', dialog => dialog.dismiss());
page.evaluate(() => {
    window.onbeforeunload = function() {
        return 'You have unsaved changes!';
    }
});

@dave-dm although a good answer too, you can't prevent page navigation from onClick events, etc...

jiangfengming

jiangfengming commented on Feb 26, 2019

@jiangfengming

req.respond({ status: 204 }) can prevent the navigation by js.

const url = 'https://www.example.com/'

page.on('request', req => {
  if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
    // no redirect chain means the navigation is caused by setting `location.href`
    req.respond(req.redirectChain().length
      ? { body: '' } // prevent 301/302 redirect
      : { status: 204 } // prevent navigation by js
    )
  } else {
    req.continue()
  }
})

page.goto(url)
panthony

panthony commented on Mar 13, 2019

@panthony

The solution suggested by by @jiangfengming was working well until the new 1.13.0 release where page.goto hangs until it timeout, unfortunately.

aslushnikov

aslushnikov commented on Mar 15, 2019

@aslushnikov
Contributor

@panthony works just fine for me with 1.13.0. One correction however is to use request.abort instead of request.respond - this way we can properly abort navigation.

Try running the following script and than manually navigating from "example.com" to any other website:

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();

  const URL = 'https://example.com';

  page.on('request', req => {
    if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
      req.abort('aborted');
    } else {
      req.continue();
    }
  });
  await page.setRequestInterception(true);
  await page.goto(url)
})();

This is the recommended way to implement navigation locks.

panthony

panthony commented on Mar 15, 2019

@panthony

@aslushnikov Ok, I started to return 204 when aborting the navigation request made goto failed:

#3421 (comment)

Guess I'll give abort a try again :)

aslushnikov

aslushnikov commented on Mar 15, 2019

@aslushnikov
Contributor

@panthony let me know how it goes!

rymdhund

rymdhund commented on Apr 5, 2019

@rymdhund

@aslushnikov The code you linked timeouts for me every time when I change example.com to a page that does window.location = 'http://google.com'. Both on 1.13.0 and 1.14.0. Maybe it's the interception that is causing trouble again?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = 'http://example.com';

  page.on('request', req => {
    console.log(`request: ${req.url()}`);
    if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
      console.log('abort');
      req.abort('aborted');
    } else {
      req.continue();
    }
  });
  await page.setRequestInterception(true);
  await page.goto(url)
})();
aslushnikov

aslushnikov commented on Apr 5, 2019

@aslushnikov
Contributor

@rymdhund not sure, the following works just fine for me.

Note: the page.goto will throw the ERR:ABORTED since you abort the navigation, so you should handle the exception.

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = 'http://example.com';

  page.on('request', req => {
    console.log(`request: ${req.url()}`);
    if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
      console.log('abort');
      req.abort('aborted');
    } else {
      req.continue();
    }
  });
  await page.setRequestInterception(true);
  await page.goto(url)
})();
panthony

panthony commented on May 14, 2019

@panthony

@aslushnikov Aborting the navigation request like suggested above will make the goto hang if you happen to use waitUntil: [ 'networkidle2' ].

DCtheTall

DCtheTall commented on Mar 25, 2020

@DCtheTall

Hey all, now that Chrome is planning on deprecating the Network.* Devtools Protocol API, has anyone found a way to do this using the new Fetch Devtools API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    chromiumIssues with Puppeteer-Chromiumfeature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @panthony@GuilloOme@IvanTrendafilov@aslushnikov@Wildhoney

      Issue actions

        No page navigation lock ? · Issue #823 · puppeteer/puppeteer